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THE ASYMPTOTIC DISTRIBUTION OF A SINGLE 
EIGENVALUE GAP OF A WIGNER MATRIX 
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r-{ ' Abstract. We show that the distribution of (a suitable rescaling of) a sin- 

^A ' gle eigenvalue gap Ai_|_i(M n ) — Xi(M n ) of a random Wigner matrix ensemble 

in the bulk is asymptotically given by the Gaudin-Mehta distribution, if the 
Wigner ensemble obeys a finite moment condition and matches moments with 
the GUE ensemble to fourth order. This is new even in the GUE case, as 
prior results establishing the Gaudin-Mehta law required either an averaging 
in the eigenvalue index parameter i, or fixing the energy level u instead of the 

^^ eigenvalue index. 

^ The extension from the GUE case to the Wigner case is a routine appli- 

• ' cation of the Four Moment Theorem. The main difficulty is to establish the 

^_ , approximate independence of the eigenvalue counting function N(_ oox \(M n ) 

jrt ' (where M„ is a suitably rescaled version of M n ) with the event that there is 

no spectrum in an interval [x, x + s] , in the case of a GUE matrix. This will be 
done through some general considerations regarding dctcrminantal processes 
given by a projection kernel. 
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1. Introduction 

Given annxn Hermitian matrix M n , we let 

Ai(M„)< ...<X n {M n ) 



be the n eigenvalues of M n in non-decreasing order, counting multiplicity. The 
K> | purpose of this paper is to study the eigenvalue gaps Xi + \{M n ) — Xi(M n ) of such 

!-h . matrices when M n is drawn from the Gaussian Unitary Ensemble (GUE), or more 

generally from a Wigner random matrix ensemble, in the asymptotic limit n — > oo 
and for a single i — i(n) in the bulk region en < i < (1 — e)n. 

To begin with, let us set out our notational conventions for GUE and Wigner 
ensembles: 

Definition 1 (Wigner and GUE). Let n > 1 be an integer (which we view as a 
parameter going off to infinity) . An n x n Wigner Hermitian matrix M n is defined 
to be a random Hermitian n x n matrix M n = (£ij)i<ij<n, in which the £y for 
1 < i < j < n are jointly independent with £^ = £y (in particular, the £« are 
real- valued) , and each £y has mean zero and variance one. We say that the Wigner 
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2 TERENCE TAO 

matrix ensemble obeys condition CI with constant Co if one has 

supEl^l 6,0 <C 

for some constant C (independent of n). 

A GUE matrix M n is a Wigner Hermitian matrix in which £y is drawn from 
the complex gaussian distribution N(0, l)c (thus the real and imaginary parts are 
independent copies of iV(0, 1/2)r) for i ^ j, and £u is drawn from the real gaussian 
distribution JV(0, 1) R . 

A Wigner matrix M n = (Cij)i<i.j<n is said to match moments to m th order with 
another Wigner matrix M' n = (dj)i<i,j<n for some to > 1 if one has 

E(Re&)°(Im&) 6 = E(Rc^) a (Im^)" 
whenever a, b € N with a + b < m. 

The bulk distribution of the eigenvalues Ai(M„), . . . , \ n (M n ) of a Wigner (and 
in particular, GUE) matrix is governed by the Wigner semicircle law. Indeed, if 
we let Ni(M n ) denote the number of eigenvalues of M n in an interval I, and we 
assume Condition CI for some Co > 2, then with probability 1 1 — o(l), we have 
the asymptotic 

N ^i( M n) = n p sc (u) du + o(n) 
uniformly in I, where p sc is the Wigner semi-circular distribution 

Psc(«):=^(4- U 2 )f; 

see e.g. [1]. Informally, this law indicates that the eigenvalues Aj(M„) are mostly 
contained in the interval \—2^fn 1 2 v / n], and for any energy level u in the bulk region 
— 2 + e<u<2 — efor some fixed e > 0, the average eigenvalue spacing should be 
r- 1 , ; near Jnu. 

Now let M n be drawn from GUE. The distribution of the eigenvalues Ai(M„), . . . , X n (M n ) 

are then well-understood. If we define the fc-point correlation functions p^ : R fe — > 
R + for < k < n to be the unique symmetric continuous function for which 

E Y, F(X il (M n ),...,X ik (M n )) = j F(x u ...,x k )p { k n) (x u ...,x k )dx 1 ...dx k 

l<H<...<i k <n J Rk 

for any continuous function F which is compactly supported in the region {x\ < 
. . . < Xk}, then one has the well-known formula of Dyson [9] 

Pn \ x li ■ ■ ■ -i x n) — JZ \ n /2 e * _1 * 11 \ X i~ X j) 

^ ' l<i<j<n 

and the Gaudin-Mehta formula 

p k n '(x l7 ...,x k ) = det(ir ( ™ ) (xi,x i ))i< ij < fe 



See Section 2 for the conventions for asymptotic notation such as o(l) that are used in this 
paper. 
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where K^ n \x,y) is the kernel 



ra-l 

y 2 /i 



(1) K^(x,y) :=Y,Pk{x)e- x /4 P fe (y)e 

fe=0 

and Pq(x), Pi(x), . . . are the L 2 -normalised orthogonal polynomials with respect 
to the measure e~ x ' 2 dx (and are thus essentially Hermite polynomials); see e.g. 
[19] or [1]. In particular, the functions Pk(x)e~ x / 4 for i — 0, . . . , n — 1 are an 
orthonormal basis to the subspace V^ of L 2 (1R) = L 2 (R, dx) spanned by x l e~ x / 4 
for % = 0, . . .,n— 1, thus the orthogonal projection P( n ) to this subspace is given 
by the formula 

i ,W /W= [ K {n) (x,y)f(y) dy 

Jr 

for any / G i 2 (M). 

Applying the inclusion-exclusion formula, the Gaudin-Mchta formula implies that 
for any interval 2 /, the probability P(7Vr(M„) = 0) that M n has no eigenvalues in 
/, where Nj(M n ) is the number of eigenvalues of M n in /, is equal to 

P(Nj(M n ) = 0) = Y,t}L /*... fdet(K ( - n \x i ,x j )) 1 < itj < k dx 1 ...dx k . 
fe=0 ' -* 1 -* 1 
One can also express this probability as a Fredholm determinant 

P(JV/(M n ) = 0) = dct(l - 1/PWlj), 

where we view the indicator function 1/ as a multiplier operator on L 2 (R). 

The asymptotics of i^™) as n — >• oo are also well understood, especially in the 
bulk of the spectrum 3 , which in this normalisation corresponds to the interval 
[(—2 + s)\/n, (2 — e)\/n\ for any fixed e > 0. Indeed, if —2 + e<u<2 + e and 
x, y are bounded uniformly in n, then from the Plancherel-Rotarch asymptotics for 
Hermite polynomials one has 

(2) l K<- n \uyfa+ X uV^+ , V . ^ )=K sinc (x,y) + o(l) 

where -Ksinc is the Dyson sine kernel 

sin(7r(a; - y)) 

K S inc(X,y) := r 

ir(x-y) 

with the usual convention that K(x,x) — 1; see e.g. [19], [1], or [8, Corollary 
1]. Note that the normalisations in (2) are consistent with the heuristic, from the 



One can generalise this formula from intervals to arbitrary Borel measurable sets, but in our 
applications we will only need the interval case. Similarly for many of the other determinantal 
process identities used in this paper. 

3 For the edge of the spectrum, one can control individual eigenvalues instead by the Tracy- 
Widom law [29]. There is however a transitional regime between the bulk and the edge which is 
not covered by either our results or by the Tracy- Widom law, e.g. when min(i, n — i) is comparable 
to n for some fixed < 9 < 1, and which may be worth further attention. Given that convergence 
to the sine kernel is also known in such regimes, one would expect the main results of this paper 
to extend to these settings, although to make this intuition rigorous would require some careful 
argument which we do not pursue here. 



4 TERENCE TAO 

Wigner semi-circular law, that the mean eigenvalue spacing at Uy/n is — -. . -1= . We 
observe that the Dyson sine kernel is also the kernel to the orthogonal projection 
-Psine to those functions / e L 2 (R, dx) whose Fourier transform 

/(0 := / e- 2 ^f(x) dx 

is supported on the interval [—1/2, 1/2]. 

From (2) and some careful treatment of error terms (sec e.g. [1, Chapter 3]) one 
obtains that 

°° f— l) k f f 

F ( N u^+ J i( M n) = 0) = V ^-rj- / • • • / detiKsin^x^Xj^iKijKk dxi ■ ..dx k +o(l), 

"«<«)•» fc=0 ft! J/ Ji 

or in Fredholm determinant form, 

P(N uV ^ + . f (M n ) - 0) = dct(l - 1/Psind/) + o(l). 

Note that the kernel _Ksino(^, J/)l/(j/) of Psinol/ is square-integrable, and so Psinol/ 
is in the Hilbert-Schmidt class, and so l/Psinel/ = (-Psinelj)*(-Psineli) is trace class. 

This asymptotic can in turn be used to control the distribution of the averaged gap 
spacing distribution. Indeed, if 1 < t n < n is any sequence such that l/t n ,t n /n = 
o(l), then for any —2 + e < u < 2 — e and s > independent of n, the quantity 

S(s,t n ,u,M n ) := ^li^M ^_ 

zt n 

has the asymptotic 



(3) S(s,t n ,u,M n )= p(y)dy + o(l) 

Jo 

where p is the Gaudin distribution (or Gaudin-Mehta distribution) 

d 2 

P^} := d^ 2 ~ dct ( 1 ~ 1 [0,y] P Sine'i-lO,v]), 

or equivalcntly 

(4) dct(l - l[Q, y ]Psmcl[o,y]) = / p(z)(z - y) dz; 



see [7] for details. The quantity det(l — lfo^iPsinelfo.j/l) ( an d hence the Gaudin 
distribution p(y)) can also be expressed in terms of a solution to a Painleve V 
ordinary differential equation. More precisely, one has 

det(l - ljo.j/jPsinelio,!/]) = cxp / dx 

\Jo x 

where a solves the ODE 

(xct") 2 + 4(xa' - a)(xa' - a + [a') 2 ) = 



INDIVIDUAL EIGENVALUE GAP 5 

with boundary condition <j(x) ~ — — as x — > 0; see [15] (or the later treatment in 
[30]). Among other things, this implies that the Gaudin distribution p and all of 
its derivatives are smooth, bounded, and rapidly decreasing on (0, +oo). 

We also remark that the extreme values of the gaps A»+i(M„) — Xi(M n ) are also 
well understood; see [2]. However, our focus here will be on the bulk distribution 
of these gaps rather than on the tail behaviour. 

In [25], a Four Moment Theorem for the eigenvalues of Wigner matrices was 
established, which roughly speaking asserts that the fine scale statistics of these 
eigenvalues depend only on the first four moments of the coefficients of the Wigner 
matrix, so long as some decay condition (such as Condition CI) is obeyed. In 
particular, by applying this theorem to the asymptotic (3) for CUE matrices, one 
obtains 

Corollary 2. The asymptotic (3) is also valid for Wigner matrices M n which obey 
Condition CI for some sufficiently large absolute constant Cq, and which match 
moments with GUE to fourth order. 



Proof. See [25, Theorem 9]. Strictly speaking, the arguments in that paper require 
an exponential decay hypothesis on the coefficients on M n rather than a finite mo- 
ment condition, because the four moment theorem in that paper also has a similar 
requirement. However, the refinement to the four moment theorem established in 
the subsequent paper [26] (or in the later papers [27], [17]) relaxes that exponential 
decay condition to a finite moment condition. □ 



We remark that the moment matching hypothesis in this corollary can in fact be 
removed by combining the above argument with some similar results obtained (by 
a different method) in [16], [10]; see [11]. 

The Wigner semi-circle law predicts that the location of an individual eigenvalue 
Xi(M n ) of a Wigner or GUE matrix M n for i in the bulk region en < i < (1 — 
e)n should be approximately y/nu, where u — u i i n is the classical location of the 
eigenvalue, given by the formula 

/u 
Psdy) dy= -. 
-oo n 

Indeed, it is a result of Gustavsson [14] that «i n)-v nu converges in dis- 

ylog n/2-K 2 / y/np sc (u) 

tribution to the standard real Gaussian distribution N(0, 1)r, or more informally 
that 

/ ,, n ,,, i— \osn/2TT 2 . 
(6) Aj(M n )«JVV»u, " ' 



{Vn~Psc(u)) 2 " 



Indeed, the famous Wigner surmise predicts the reasonably accurate approximation p(x) 
^TTxe~ 7TX / 4 ; see e.g. [19]. 
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Note that the standard deviation ^--= — r^— here exceeds the mean eigenvalue spac- 

V"Pac(«) ° ^ 

ing .— 1 , . by a factor comparable to \/Iogn. If one heuristically applies this ap- 
proximation (6) to the gap distribution law (3), one is led to the conjecture that 
the normalised eigenvalue gap 

X i+1 (M n ) - Xj(M n ) 
l/{y/np sc {u)) 
should converge in distribution to the Gaudin distribution, in the sense that 

(7) P( — TTTr — T~\\ — - s > = / P(y) d y + °( 1 ) 

for any fixed s > 0. 

Unfortunately, this is not quite a rigorous proof of (7). The problem is that the 
asymptotic (3) involves not just a single eigenvalue gap Aj+i — Aj, but is instead 
an average over all eigenvalue gaps near the energy level y/nu. By (6), one is then 
forced to consider the contributions of at least 3> -y/logn different values of i that 
could contribute to (3). One would of course expect the behaviour of Aj+i — Aj for 
adjacent values of i to be essentially identical, in which case one could pass from 
the averaged gap distribution (3) to the individual gap distribution (7). However, 
it is a priori conceivable (though admittedly quite strange) that there is non-trivial 
dependence on i, for instance that Aj+i — Xi might tend to be larger than predicted 
by the Gaudin distribution for even i, and smaller than predicted for odd i, with the 
two effects canceling out in averaged statistics such as (3), but not in non-averaged 
statistics such as (7). 

Our main result rules out such a pathological possibility: 

Theorem 3 (Individual gap spacing). Let M n be drawn from CUE, and let en < 
% < (1 — e)n for some fixed £ > 0. Then one has the asymptotic (7) for any fixed 
s > 0, where u = Ui/ n is given by (5). 

Applying the four moment theorem from [25] (with the extension to the finite 
moment setting in [26]), one obtains an immediate corollary: 

Corollary 4. The conclusion of Theorem 3 is also valid for Wigner matrices M n 
which obey Condition CI for some sufficiently large absolute constant Co, and which 
match moments with GUE to fourth order. 



Proof. This can be established by repeating the proof of [25, Theorem 9] (in fact the 
argument is even simpler than this, because one is working with a single eigenvalue 
gap rather than with an average, and can proceed more analogously to the proof 
of [25, Corollary 21]). We omit the details. □ 

In view of the results in [11], it is natural to conjecture that the moment matching 
condition can be removed. Following [11], it would be natural to use heat flow 
methods to do so, in particular by trying to extend Theorem 3 to the gauss divisible 
ensembles studied in [16]. However, the methods in this paper rely very heavily 
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on the determinantal form of the joint eigenvalue distribution of GUE (and not 
just on control of the /c-point correlation functions); the formulae in [16] also have 
some determinantal structure, but it is unclear to us whether this similarity of 
structure is sufficient to replicate the arguments 5 . On the other hand, we expect 
analogues Theorem 3 to be establishable for other ensembles with a determinantal 
form, such as GOE and GSE, or to more general [3 ensembles involving a non- 
quadratic potential for the classical values 1,2,4 of (3. We will not pursue these 
matters here. 

The key to proving Theorem 3 lies in establishing the approximate independence 6 
of the eigenvalue counting function Nr^^lMn) from the event that M n has no 
eigenvalues in a short interval [x,x + s] (i.e. that Ni x . x +s]{M n ) = 0), where M n is 
a suitably rescaled version of M n . Roughly speaking, this independence, coupled 
with a central limit theorem for Ni y _ cax )(M n ) 1 will imply that the distribution of 
a gap \i + i(M n ) — \i(M n ) is essentially invariant with respect to small changes in 
the i parameter. To obtain this approximate independence, we use the properties 
of determinantal processes, and in particular the fact that a determinantal point 
process S, when conditioned on the event that a given interval such as [x, x + s] 
contains no elements of S, remains a determinantal point process (though with a 
slightly different kernel). The main difficulty is then to ensure that the new kernel 
is close to the old kernel in a suitable sense (more specifically, we will compare the 
two kernels in the nuclear norm S 1 ). 

We thank Peter Forrester, Van Vu, and the anonymous referee for corrections. 



2. Notation 

In this paper, n will be an asymptotic parameter going to infinity. A quantity is 
said to be fixed if it does not depend on n; if a quantity is not specified as fixed, 
then it is permitted to vary with n. Given two quantities X, Y, we write X = 0(Y), 
X < Y, or Y > X if we have \X\ < CY for some fixed C, and X = o{Y) if X/Y 
goes to zero as n — > oo. 

An interval will be a connected subset of the real line, which may possibly be half- 
infinite or infinite. If I is an interval, we use I c := M\I to denote its complement. 



By using heat flow methods such as the method of local relaxation flow [12], one can obtain 
control on energy-averaged correlation functions in this setting, and similarly for non-classical 
/3-cnscmbles /3 ^ 1,2,4 as was done recently in [5]. Such bounds are sufficient to obtain averaged 
gap information of the form (3) (at least for values of t n that grow faster than logarithmic), but 
it is not obvious how to isolate a single eigenvalue gap to then obtain (7). 

It may be surprising to the experts that the counting functions on (— oo,a;) and [x,x + s] 
are approximately independent, as the intervals are adjacent. The point is that while there is 
a correlation between the two counting functions, the covariance between them is essentially of 
order O(l), whilst the variance of N/^^ is of order logn, and so the correlation between the 
two random variables is ends up being asymptotically negligible. To put it another way, most of 
the random fluctuation of Nr^ x \ comes from the portion of the spectrum that is far away from 
x, and this contribution will be almost completely decoupled from the spectrum at [x,x + s\. 
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We use \f— I to denote the imaginary unit, in order to free up the symbol i for 
other purposes, such as indexing eigenvalues. 

Given a bounded operator A on a Hilbert space H, we denote the operator norm 
of A as ||A|| op . We will also need the Hilbert- Schmidt norm (or Frobenius norm) 

\\A\\ HS := (tracc(A* A)) 1 ' 2 = (tracc^A*)) 1 / 2 , 

with the convention that this norm is infinite if A* A or AA* is not trace class. 
Similarly, we will need the Schatten l-norm (or nuclear norm) 

\\A\\ s i := tracc((yl* A) 1 / 2 ) = tracc((AA*) 1 / 2 ), 

which is finite when A is trace class. Note that if A is compact with non-zero 
singular values o\, o~2, . . . then we have 

||-4||op = SUp|CTj| 

i 

||A|| ffS = £>| 2 ) 1/2 

i 

pus' = Eh- 

i 

Indeed, one should view the operator, Hilbert-Schmidt, and nuclear norms as non- 
commutative versions of the £°°, £ 2 , and i 1 norms respectively. 

For us, the reason for introducing the nuclear norm S 1 is that it controls the trace: 

| trace A\ < \\A\\ s i. 

On the other hand, the Hilbert-Schmidt and operator norms are significantly easier 
to estimate than the nuclear norm. To bridge the gap, we will rely heavily on the 
non-commutative Holder inequalities 



lollop < 

\AB\\ HS < 
\AB\\ HS < 
\\AB\\ S1 < 
\\AB\\ S1 < 
\\AB\\ S i < 



A\ p 1 1 B 1 1 p 

^|op||5||ffS 

A\hs\\B\\ op 

A\ op \\B\\ s1 

A\s4B\\ op 

A\ hs\\B\\hs; 



see e.g. [4]. We will use these inequalities in this paper without further comment. 
We remark that for integral operators 

Tf(x):= f K(x,y)f(y) dy 

Jr 

on I/ 2 (R) for locally integrable K, the Hilbert-Schmidt norm of T is given by 

\\T\\hS = ([ f \K(x,y)\ 2 dxdy) 1 ' 2 
when the right-hand side is finite. 
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3. Some general theory of determinantal processes 

In this section we record some of the theory of determinantal processes which we 
will need. We will not attempt to exhaustively describe this theory here, referring 
the interested reader to the surveys [21], [23] or [20] instead. We will also not 
aim for maximum generality in this section, restricting attention to determinantal 
processes on R, whose associated locally trace class operator P will usually be an 
orthogonal projection, and often of Unite rank. 

Define a good kernel to be a locally integrable function K : R x R — >• C, such that 
the associated integral operator 



P/(x):= / K(x,y)f(y) dy 

can be extended from C C (R) to a self-adjoint bounded operator on L 2 (R), with 
spectrum in [0,1]. Furthermore, we require that P be locally trace class in the 
sense that for every compact interval /, the operator 1/P1/ is trace class; this 
will for instance be the case if K is smooth. If if is a good kernel, then (as was 
shown in [18], [23]; see also [20] or [1]), K defines a point process X C R, i.e. a 
random subset 7 of R that is almost surely locally finite, with the fc-point correlation 
functions 

(8) p k (xi,...,x k ) := dct(K(xi,Xj))i<i,j< k 

for any k > 0, thus 



E]J#(SnJ i )= / 

i=1 Jhx...xl k 



p k (xi,...,Xk) dx 1 ...dx k 



for any disjoint intervals I\ , . . . , I k - This process is known as the determinantal 
point process with kernel K. 

The distribution of a determinantal point process in an interval / is described by 
the following lemma: 

Lemma 5. Let £ be a determinantal point process on M. associated to a good kernel 
K and associated operator P. Let L be a compact interval, and suppose that the 
operator 1/P1/ has non-zero eigenvalues Ai, A2, • • ■ € (0,1]. Then #(EflJ) has the 
same distribution as X}i£i> where the £j are jointly independent Bernoulli random 
variables, with each £j equalling 1 with probability Xi and with probability 1 — Aj. 
In particular, one has 

E#(S nl) = ^\= trace(ljPlj) 

i 

and 

Var#(E n I) = ^(1 - Xi)Xi = tracc((l - 1 J P1 / )1 7 P1 / ), 



Strictly speaking, a point process is permitted to have multiplicity, so that it becomes a 
multiset rather than a set. However, as we are restricting attention to kernels K which are locally 
integrable, the determinantal point processes we consider will be almost surely simple, in the sense 
that no multiplicity occurs. 
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P(#(E n I) = 0) = JJ(1 - A,) = dct(l - ljPlj) 



Proof. See e.g. [1, Corollary 4.2.24]. D 

As a corollary of Lemma 5, we see that P(#(S n I) = 0) > unless P has an 
eigenfunction of eigenvalue 1 that is supported on I. 

An important special case of determinantal point processes arises when the op- 
erator P is an orthogonal projection of some finite rank n, which is the situation 
with the GUE point process {Ai(M„), . . . , A n (M„)}, which as discussed in the in- 
troduction is a determinantal point process with kernel K^ n ' given by (1). In this 
case, the hypotheses on P (i.e. self-adjoint trace class with eigenvalues in [0, 1]) are 
automatically satisfied, and the determinantal point process £ is almost surely a set 
of cardinality n; see e.g. [23], [20] or [1]. In this situation, the fc-point correlation 
functions pk vanish for k > n, and for k < n we have the Gaudin lemma 

(9) pk(xi,...,Xk) = r / pk+i{xi,...,Xk+i) dxk+i 



which allows one to recursively obtain the correlation functions from the n-point 
correlation function p n (which is essentially the joint density function of the n 
elements of £). Note that (9) in fact holds for any point process whose cardinality 
is almost surely n, if the process is almost surely simple with locally integrablc 
correlation functions. 

If V is the n-dimcnsional range of P, and <j>\, . . . ,<j> n is an orthonormal basis for 
V, then the kernel K of the orthogonal projection P can be expressed explicitly as 

n 

K ( x ,y) = ^2<t>i( x )<f>i(v) 

i=i 

and thus (by the basic formula det(A* A) = | det(A)| 2 ) 

(10) p n (xi,...,x n ) = |det(^,(a;j))i<, i j<„| 2 . 

This leads to the following consequence: 

Proposition 6 (Exclusion of an interval). Let S be a determinantal process asso- 
ciated to the orthogonal projection Py to an n-dimensional subspace V of L (R). 
Let I be a compact interval, and suppose that no non-trivial element of V is sup- 
ported in I . Then the event E := (#(£ n I) = 0) occurs with non-zero probability, 
and upon conditioning to this event E, the resulting random variable (S|P) is a 
determinantal point process associated to the orthogonal projection P\ Ic v to the 
n-dimensional subspace ljcV of L 2 (M). 

Proof. This is a continuous variant of [21, Proposition 6.3], and can be proven 
as follows. By construction, Py has no eigenvector of eigenvalue 1 supported in 
/, and so P(P) = det(l — 1/Py) is non-zero. The point process (S|P) clearly has 
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cardinality n almost surely, and is thus described by its n-point correlation function, 
which is a constant multiple of 

p n (x 1 ,...,X n )l I c(xi) ...l/c(x„), 

which by (10) can be written as 

(11) \det((p i l I c(x j )) 1 < i j< n \' 2 , 

where <f>i, . . . , <f> n is an orthonormal basis for V . 

By hypothesis on V, </>il/c, • • • , 4>n^i c is a (not necessarily orthonormal) basis for 
licV. By row operations, we can thus write (11) as a constant multiple of 

\de%{4>' i {Xj))l<i,j< n \ 2 , 

where tfi'i, . . . ,cfr' n is an orthonormal basis for ljcV. But this is the n-point cor- 
relation function for the determinantal point process of P\ ic v- As the n-point 
correlation function of an n-point process integrates to n! (cf. (9)), we see that 
the n-point correlation function of (£\E) must be exactly equal to that of the 
determinantal point process of P\ Ic v, as claimed. □ 



It is likely that the above proposition can be extended to infinite-dimensional 
projections (possibly after imposing some additional regularity hypotheses), but 
we will not pursue this matter here. 

We saw in Lemma 5 that if E is a determinantal point process and J is a compact 
interval, then the random variable #(£ n /) is the sum of independent Bernoulli 
random variables. If the variance of this sum is large, then such a sum should 
converge to a gaussian, by the central limit theorem. Examples of such central limit 
theorems for #(£ fl I) were formalised in [6], [24]. We will need a slight variant of 
these theorems, which gives uniform convergence on the probability density function 
of #(£ fl /) as opposed to the probability distribution function. 

Lemma 7 (Discrete density version of central limit theorem). Let X = £i + . . .+£„ 
be the sum of independent Bernoulli random variables £i, . . . , £ n , with mean EX = 
/i and variance VarX = a 2 for some a > 0. Then for any integer m, one has 

P(X = m) = —^ e -(™-nf/^ 2 + 0{<j- 1 - 7 ). 

V27TCT 



One can improve the error term here to O(o 2 ) with a little more effort, but any 
error better than 1/cr will suffice for our purposes. 



Proof. We may assume that a is larger than any given absolute constant, as the 
claim is trivial otherwise. We use the Fourier-analytic method. Write pi := E£j, 
then 



E^ 



M= ? Vi 
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and 

(12) 



* 2 = I>(i-; 



Observe that X has characteristic function 



Ee 2.v^T*X = J| ((1 _ pi) +p . e ^V=T t) 



? = l 



and so 



/l/Z "- 

TT((1 - p i )e-*" /=lpit + Pie Wi-Pi)V=it) e -2*V=i(m-n)t dt 
-1/2 f=l 

We can rewrite this integral slightly as A + B, where 

,. n 

A-= \\{{l - Pi)e _W3Tpit + Pi e 2wv ^ T(1 - p * )t )e- 2TV ^ T(m -' i) * di 

JltKo- - 9 ~T 



and 



/J := / TT((1 - p^e-^^P* 1 + p ie 2^V^T(l-Pi)t) e -27r^T(m-^t ^ 

/<T-0- 3 <t<l/2^i 



We first control A. From Taylor expansion one has 

(1 - Pi )e-W=Twt +p . e 2 ffV ^T(i-Pi)t = exp (_2 7 r 2 p i (l -p 4 )* 2 + 0(^(1 - Pi )l*| 3 )) 
in this regime, and so by (12) 

n 

11(1 - Piy- 2 *^ 1 ** + Pl e^^ l{1 -^ t = cxp(-2^Vi 2 + O^- - 7 )). 

t=i 

We therefore have 

_-0.9 

A = / (1 + O ( (T -0.7 ))e -2^ CT 2 t 2 e -2 T ^T (m ^)t dt _ 



Since 



and 



and 



-27r 2 <T 2 t 2 ^ _ J; r -(rn-n) 2 /2<r 2 



1 



27TCT 



£ -2n 2 a 2 t 2 e -27^^/ _ T(m-^)t ^ 



/27rcr 
-27T 2 (T 2 t 2 7. _ /nc_-100\ 



|t|>CT-°- 9 



(say), we conclude that 



(13) 



A 



27TCT 



di = ©(a- 101 ') 



;-( m -") 2 /2^ + 0(^-1-7). 



Now we control B. Elementary computation shows that 

|(1 -pje- 2 "^"* +p l e 2 *^~ ll - 1 -^ t \ < cxpi-cMl-p^t 2 ) 
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in this regime for some absolute constant c > O.By (12), we may thus bound 



2,2 
-CO t 



\B\ < / e- ca l dt 

J\t\>cr- - 9 

and so B = 0(a~ 1J ). Combining this with (13), the claim follows. □ 

Combining Lemma 5, Proposition 6, and Lemma 7 we immediately obtain 

Corollary 8. Let £ be a determinantal point process on R whose kernel P is an 
orthogonal projection to an n- dimensional subspave V o/L 2 (R). Let L be a compact 
interval, and m be an integer. Then 

P(#(£ n I) = m) = —^ e -(™-K) 2 /2° 2 + 0(o-- 17 ). 
\/2-ko- 



where 

H := trace(l/Pl/) 
and 

a 2 := trace((l - 1jP1j)(1jP1j)). 
Furthermore, if J is another compact interval disjoint from J, such that no non- 
trivial element of V is supported on J, then 

P(#(E n /) = m|#(S n J) = 0) = ^^ e -(™-A) 2 /2* 2 + op- 1 - 7 ), 

V27TCr 



where 

ft := tracc(Plj) 
and 

a 2 := trace(Pl 7 oPl 7 ), 

and P is the orthogonal projection to ljcV. 

Let the notation be as in the above corollary. In our application, we will need to 
determine the extent to which events such as #(Snl) — m and #(£ n J) = are 
independent. In view of Corollary 8, it is then natural to determine the extent to 
which the projection P differs from that of P. 

Observe that as no non-trivial element of V is supported on J, the operator 
PljcP, viewed as a map from V to V, is invertible. Denoting its inverse on V 
by (PljcP)y , we then see that the operator ljcP(Pl J cP)y 1 Pljc is self-adjoint, 
idempotcnt, and has Ij^V as its range, and so must be equal to P: 
(14) P:=l J cP(Pl J cP)- 1 Pljc 

We can also write (PljcP)y 1 as (1 — PljP)^ 1 (since P is the identity on V). Thus, 
by Neumann series, we formally have the expansion 



P = ljcPljc + ljcPljPlje + ljcPljPljPl 



This expansion is convergent for sufficiently small J, but does not necessarily con- 
verge for J large. However, in practice we will be able to invert 1 — PljP by a 
perturbation argument involving the Fredholm alternative. More precisely, in our 
application, the finite rank projection P will be "close" in some weak sense to an 
infinite rank projection Pq (in our application, Pq will be the Dyson projection 
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-Psine), projecting to some infinite-dimensional Hilbcrt space Vo- We will assume 
Po to be locally trace class, so that PoljPo is compact. If we assume that no non- 
trivial element of Vo is supported on J, then the Fredholm alternative (see e.g. [22, 
Theorem VI. 14]) then implies that 1 — PqIjPq : Vo — > Vo is invertible, with inverse 
(1 — Pq\jPq)y = 1 + Ko for some compact operator Ko, thus 

(15) {1 + K )(l - PoljPo) = 1 

on L 2 (R). As 1 — PoljPo is self-adjoint and is the identity on Vo, we see that Ko 
has range in Vo and cokernel in Vq 1 - , thus 

Ko = PoK = K Po = PvKoPo- 

One then expects 1 + PKoP : V — > V to be an approximate inverse to 1 — PljP. 
Indeed, we have 

(16) (l + PK P)(l-PljP) = l + E 

where 

E:=PK P-P(l + K )PljP 

Meanwhile, from (15) we have 

(17) Ko = (I + Ko) PoljPo 
and thus 

E = P(1 + K )(P ljPo - PljP)P. 
Let us now bound some norms of E. As the projection operator P has an operator 
norm of at most 1 , one has 

lollop < (l + ||^o||op)||Poij-Po - PljP||o P ; 

splitting PoljPo - PljP as (P - P)ljPo - Plj(Po - P) we conclude that 

lollop < (1 + ||#0||op)(||(P0 - P)l,/Poj|op + ||(P - P)ljP||op). 

If we now make the hypothesis that 

(18) ||(Po-P)l./||o P < 1 | 

4(1 + ||A ||opj 

then we have ||i?|| op < 1/2, and so we have the Neumann series 

(1 + E)- 1 = 1 -E + E 2 - .... 

In particular, 

||(l + S)- 1 -l|| sl <2||i?|| sl . 
To bound the right-hand side, we use the triangle inequality to obtain 

||f?|| S i < (1 + ||A-o||op)(||PoljPo|U> + \\PljP\\ S i). 

Factorising PoljPo = (1jPo)*(1jPo) and similarly for PljP, we conclude that 

||(1 + P)- 1 - l|| sl < 2(1 + ||Po||o P )(||ljPo|| 2 ffs + ||ljP|| 2 ffs )- 

Note that E maps V to itself, and so (1 + P) _1 can also be viewed as an operator 
from V to itself (being the identity on V- 1 ). From (16) one then has 

(Plj.P)^ 1 = (1 + E)-\l + PKoP) 

and thus 

H(PljcP)- 1 - (1 + PKoP)\W < 2(1 + l|A-oi|o P ) 2 (||ljPo|| 2 ffS + \\IjP\\ 2 hs)- 
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Applying (14), we conclude that 

||P - lj.P(l + PK P)Pljc\\ S i < 2(1 + \\K Q \\ op ) 2 (\\ljP Q \\ 2 HS + \\IjP\\ 2 hs)- 

To deal with the PKqP term we observe from (17) and the factorisation PqIjPq = 
{1jP )*(1jPo) that 

||^o||si<(l + ll^o||op)||ljPo|||rs. 
and so 

\\P - 1jcP1jc\\ S1 < 3(1 + ||^o||op) 2 (||l./Po||^5 + \\UP\\ 2 Hs)- 

We summarise the above discussion as a proposition: 

Proposition 9 (Approximate description of P). Let P be a projection to an n- 
dimensional subspace V of L 2 (R), and let J be a compact interval such that no 
non-trivial element of V is supported on J. Let Pq be a projection to a (possibly 
infinite- dimensional) subspace Vq o/L 2 (R) which is locally trace class, and such 
that no non-trivial element ofVo is supported on J. Let Ko : L 2 (M.) — > L 2 (M.) be the 
compact operator solving (15) that is provided by the Fredholm alternative. Suppose 
that 

(19) ||(Po-P)M|op< 1 | ■ 

4(1 + ||A ||op) 

Let P be the orthogonal projection to ljcV. Then 

||P - 1jcP1jc\\ s1 < 3(1 + ||^o||op) 2 (||l./Po||^ + \\UP\\ 2 hs). 

Because the S 1 norm controls the trace, this proposition allows us to compare the 
quantities /x,<7 2 from Corollary 8 with their counterparts /i, a 1 : 

Corollary 10. Let n, P, V, J, Po, Kq be as in Proposition 9 (in particular, we make 
the hypothesis (19) j. Let L be a compact interval disjoint from J , and let /i, cr 2 , ft, a 2 
be as in Corollary 8. Then we have 



and 



ii = H + 0[M) 

a 2 = a 2 + 0{M) 



where M is the quantity 

M := (l + \\K a \\ ov ) 2 {\\ljP Q \\ 2 HS + \\ljP\\ 2 HS ). 

In practice, this corollary will allow us to show that the random variable #(Sfl/) 
is essentially independent of the event #(En J) = for certain determinantal point 
processes S and disjoint intervals /, J. 
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4. Proof of main theorem 

We are now ready to prove Theorem 3. We may of course assume that n is larger 
than any given absolute constant. 

Let n, M n , e, i, u be as in Theorem 3, and let X be the random variable 

y _ \ l+ l{M n ) - hjMn) 
l/( v / npsc(u)) 

Clearly X takes values in R + almost surely. Our task is to show that 



P(X<s) = f p(y)dy + o(l) 
Jo 



/o 
for all fixed s > 0, or equivalently that 

(20) P(X>s)= / p(y)dy + o(l) 

J S 

for all fixed s > 0. 

It will suffice to show that 

(y - s)p(y) dy + o(l) 

for all fixed s > 0, since on applying this with two choices < si < S2 of s, 
subtracting, and then dividing by S2 — Si we see that 

Emin( ( X ~ Sl ) + ,l)= / min( y ~ Sl , l)p{y) dy + o(l); 

S2 - Si J 8l S 2 - Si 

letting si, S2 approach a given value s from the left or right, we then conclude the 
bounds 

/•OO /'OO 

/ p(y) dy - o(l) < P(X > s) < / p(y)dy + o(l) 

Js+S Js-8 

for any fixed 5 > 0, and (20) follows from the monotone convergence theorem. 
It remains to prove (21). By (4), the left-hand side of (21) can be written as 

dct(l - l[0, 8 ]Psinel[0,a]) + o(l). 

Meanwhile, if we introduce the normalised random matrix 

M n — uJn 
M := i 

l/y/np sc (u) 
then we have 

X = Ai+i(M„) - Ai(M„). 
For any fixed choice of M n , we observe the identity 



( X S ^+ " / 1 JV ( _ OOiX) (M„)= 4 AAr [x:c+s] (M„)=0 dx 
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since the set of real numbers x for which N(-oo,x){M n ) = « A N\ [X _ x+s ^(M n ) = 
holds is an interval of length X — s when X > s, and empty otherwise. Taking 
expectations and using the Fubini-Tonclli theorem, we conclude that 

E{X - s)+ = f P(JV ( _ 00il) (M n ) - i A N [x . x+s] (M n ) = 0) dx. 

JR 

Our task is thus to show that 

(22) 

P(W(-oo lit )(Afn) = » A N [x , x+s] (M n ) = 0)dx = dct(l - l [0lS ]Psinel[0, S ]) + o(l). 

Let £„ := log°' 6 n (say). We will shortly establish the following claims: 

(i) (Tail estimate) We have 
(23) f P(N { _ 00jX) (M n )=i)dx = o(l). 

J\x\>t n 

(ii) (Approximate independence) For \x\ < t n , one has 
(24) 
P(AT(-cx,,s)(M„) = iAN [XtX+s] (M n ) = 0) = P(iV ( _ 00il) (M n ) = i)P(A^ +s] (M„) = 0)+O(log-°' 85 n). 

(hi) (Gap probability at fixed energy) For \x\ < t n , one has 

(25) P(%s +5 ](M„)=o) - dct(l - l [0 , s ]Psinel[0, S ]) + o(l). 

(iv) (Central limit theorem) For |cc| < t n , one has 

(26) P(W(-oo,x) (M„) = i) = -^e-^l^' 2 + 0(log-°- 85 n) 



where cr := -\/logn/27r 2 . 

Let us assume these estimates for the moment. From (24), (25), (26) one has 
P(^(-co,,)(M„) = iAN [XiX+s] (M n ) = 0) = -^e- a;2 / 2CT2 (det(l-l [0 . s] P sino l [0 . s] )+ O (l))+O(log- - 85 n) 



for |x| < t n . Since 

^e- 2 / 2 - 2 = 1 - (1) 



'\x\<t n V2TrCT 

we conclude from the choice of t n that 

P(-^(-oo,x)(M„) = I A %, x+s] (M n ) = 0) = det(l - l [0 , s ]Psinel[0 >S ]) + o(l) 
|x|<t n 

and the claim (22) then follows from (23). 

It remains to establish the estimates (23), (24), (25), (26). We begin with (23). 
We can rewrite 



JV(_oo, x) (M n ) = Ar ( _ 00 , v ^ + _ f _ ) (M w ) 
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From the rigidity of eigenvalues of GUE (see 8 e.g. [28, Corollary 5]) we know that 

V{N^ 00 . y) (M n )=i)^n- lm 
(say) unless 

y = V^m + 0(log 0(1) n/Vn). 
Because of this, to prove (23) we may restrict to the regime where x = 0(log ' ' n). 

By Lemma 5, for any real number y, N^^^^Mn) is the sum of n independent 
Bernoulli variables. The mean and variance of such random variables was computed 
in [14]. Indeed, from [14, Lemma 2.1] one has (after adjusting the normalisation) 

f y/ ^ logn 

VN ( _^ y) (M n ) = / Psc (t) dt + O(-S-) 

while from [14, Lemma 2.3] one has 

VariV ( _ co ^ ) (M„) - (— + o(l)) logn. 

Renormalising (and using the hypothesis x — 0(log ^ ' n)), we conclude that 

EAT ( _ 00iX) (M„) = i + x + 0(1) 
and 

VarJV ( _ 00iit) (M n ) = (^ + o(l)) logn. 
Applying Bennet's inequality (sec [3]), we conclude that 

P(EAT ( _ 00)X) (M n ) = i) < cxp(-cx/ V / logn) 

for some absolute constant c > 0, which gives (23). The bound (26) follows from the 
same computations, using Lemma 7 (or Corollary 8) in place of Bennet's inequality. 

The estimate (25) is well known (see 10 e.g. [1, Theorem 3.1.1]); for future reference 
we remark that this estimate also implies the crude lower bound 

(27) P(%* +s ](M n )=o) » 1 

for n sufficiently large. We therefore turn to (24). By (27) and (26), it suffices to 
establish the conditional probability estimate 

(28) P(7V ( _ 00i£c) (M„) = i\N [XiX+B] (M n ) = 0) = — ^ e - 2 /2- 2 + (log-°- 85 n). 



"One can also derive this rigidity from the Bennett's inequality argument given below. One 
could also use the rigidity results for more general Wigncr matrices here, see [13] or [28], though 
this would be overkill. 

^Strictly speaking, Lemma 5 is not applicable as stated because (— oo,y) is not a compact 
interval, but this can be addressed by the usual truncation argument, replacing (— oo,y) with 
(— M, y) and then letting M go to infinity, exploiting the exponential decay of M n . We omit the 
routine details. 

Strictly speaking, Theorem 3.1.1 of [1] only treats the case u = 0, but the general case 
—2 + e<M<2 — e follows from the same methods; see [1, Exercise 3.7.5]. 
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We now turn to (28). Recall that the eigenvalues of M n form a detcrminantal point 
process with kernel K^ given by (1). Rescaling this, we see that the eigenvalues 
of M n form a determinantal point process with kernel K^ n > given by the formula 

1 vM,.. rz: , x .. /= , V x 



k {n \x,y) := — —K<- n \uy/n + -±__,uVn + 



p sc (u)y/n * p sc (u)^n' p sc (u)^/n 

This is the kernel of an orthogonal projection P' n ) to some n-dimensional subspace 
Vw in L 2 (R). The elements of this subspace consist of polynomial multiples of 
a gaussian function, and in particular there is no non-trivial element of V^ n ' that 
vanishes on [x,x + s]. Applying Corollary 8 (and a truncation argument to deal 
with the non-compact nature of (—00, x)), one has 

P(W(-oo,*)(M n ) = i\N [XtX+s] (M n ) = 0) = _L^-(-m') 2 /2( ct ') 2 + 0((<7 / )- 1 ' 7 ) 

where 

p := trace(P'l(_ 00)a: )) 
and 

(a') 2 := trace(P / l ( _ 00)a:) cP / l ( _ 00iX) ), 

and P' is the orthogonal projection to ^\x,x+s\ c ^ n ^ ■ To establish (28), it will thus 
suffice to establish the bounds 

p! = 0(1) 
and 

{a'f=a 2 + 0{l). 

To do this, we will use Corollary 10, with J := [x,x + s], and the role of Pq 
being played by the Dyson projection Psino- From the well-known fact that a non- 
trivial function and its Fourier transform cannot both be compactly supported, we 
see that there is no non-trivial function in the range of Psino supported in J. As 
Psino is locally trace class, we conclude from the Fredholm alternative (see e.g. [22, 
Theorem VI. 14]) that the compact operator Kq defined by (15) exists. As Kq is 
independent of n, we certainly have 11 

lollop « 1 

and similarly 

(29) ||ljPsi„c||HS«l. 

By Corollary 10 (once again using a truncation argument to deal with the half- 
infinite nature of (— oo,x)), it will thus suffice to show that 

(30) ||l./P ( " ) ||ffs«l 
and 

(31) ||(Psinc-P ( " ) )lj||op=o(l). 



Note that our bound here on ||^ol|oj> is ineffective, as it relies on the Fredholm alternative. 
However, it is quite probable that one can obtain an effective bound on Kq here by using a 
quantitative versions of Hardy's uncertainty principle to give a more robust version of the assertion 
that a non-trivial function and its Fourier transform cannot both be compactly supported. We 
will not pursue this issue here. 
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Since the Hilbert-Schmidt norm controls the operator norm, we see from (29) that 
(30), (31) will both follow from the bound 

||(P S ine-P (n) )M|iJS = 0(l). 

Using the integral kernels i^sinc, K^ n > of Psino, P^ and the compact nature of J, 
it suffices to show that 

(32) / \K sinc (x, V) - K {n) (x, y)\ 2 dx = o(l) 



uniformly for all y € J. In principle one could establish this bound from a suffi- 
ciently precise analysis of the asymptotics of Hermite polynomials (such as those 
given in [8] ) , but one can actually derive this bound from the standard convergence 
result (2) as follows. From (2) we know that K^ n \x,y) converges locally uniformly 
in x, y to Ksi ne (x, y) as n — > oo, and so 

(33) J \K Sine (x,y)-K^(x,y)\ 2 dx = o(l) 

for any fixed L. Also, as -Psinc, P^ n ' are both projections, one has 

\K S mc(x,y)\ 2 dx = K S i no (y,y) 

JR 

and 

J \K( n \x,y)\ 2 dx = K^(y,y). 

From (2), one has 

K( n \y,y) = Ks- ine (y,y) + o(l). 
For any given e > 0, one can find an L such that 

(34) f \K Si nc(x,y)\ 2 dx = 0(e) 

J\x\>L 

and thus 

\kW(x,y)\ 2 dx= f \K sinc (x,y)\ 2 dx + 0(e)+o(l). 

J-L 

But from (33) and the triangle inequality we have 

\K^(x,y)\ 2 dx= f \K Sine (x,y)\ 2 dx + o(l) 

J-L J-L 

and so 

/ \K^(x,y)\ 2 dx = 0(e) + o(l). 

J\x\>L 

From this, (33), and (34) we conclude that 

\K Sine (x,y) - K^(x, y)\ 2 dx = 0(e) + o(l) 

and the claim (32) follows by sending e to zero. The proof of Theorem 3 (and thus 
also Corollary 4) is now complete. 
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